Natural gradient descent is a principled method for adapting the parametersof a statistical model on-line using an underlying Riemannian parameter spaceto redefine the direction of steepest descent. The algorithm is examined viamethods of statistical physics which accurately characterize both transient andasymptotic behavior. A solution of the learning dynamics is obtained for thecase of multilayer neural network training in the limit of large inputdimension. We find that natural gradient learning leads to optimal asymptoticperformance and outperforms gradient descent in the transient, significantlyshortening or even removing plateaus in the transient generalizationperformance which typically hamper gradient descent training.
展开▼